Prediction of protein subcellular multisite localization using a new feature extraction method.

نویسندگان

  • L Y Wang
  • D Wang
  • Y H Chen
چکیده

A basic problem of proteomics is identifying the subcellular locations of a protein. One factor making the problem more complicated is that some proteins may simultaneously exist in two or more than two subcellular locations. To improve multisite prediction quality, it is necessary to use effective feature extraction methods. Here, we developed a new feature extraction method based on the pK value and frequencies of amino acids to represent a protein as a real values vector. Using this novel feature extraction method, the multi-label k-nearest neighbors (ML-KNN) algorithm and setting different weights into different attributes' ML-KNN, known as wML-KNN, were employed to predict multiplex protein subcellular locations. The best overall accuracy rate on dataset S1 from the predictor of Virus-mPLoc was 59.92 and 86.04% on dataset S2 from Gpos-mPLoc, respectively.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multisite protein subcellular localization prediction based on entropy density.

Protein subcellular localization prediction is currently receiving much attention in the field of protein research. Many researchers make great efforts to study single-site protein subcellular localization, but the experimental data shows that many proteins can be found in two or more sub-cellular locations, prompting the study of multisite protein sub-cellular localization. This study utilized...

متن کامل

Feature Weighting-based Classifier for Protein Subcellular Localization

Protein subcellular localization prediction plays an important role for understanding the functions and biological processes that proteins are involved in. By using protein sequence information, we can predict where a protein belongs to. In this paper, we propose a new linear classifier for predicting subcellular localizations of proteins using improved features extracted from protein sequences...

متن کامل

Prediction of subcellular localization of proteins using pairwise sequence alignment and support vector machine

Predicting the destination of a protein in a cell is important for annotating the function of the protein. Recent advances have allowed us to develop more accurate methods for predicting the subcellular localization of proteins. One of the most important factors for improving the accuracy of these methods is related to the introduction of new useful features for protein sequences. In this paper...

متن کامل

Sequence-driven features for prediction of subcellular localization of proteins

Prediction of the cellular location of a protein plays an important role in inferring the function of the protein. Feature extraction is a critical part in prediction systems, requiring raw sequence data to be transformed into appropriate numerical feature vectors while minimizing information loss. In this paper we present a method for extracting useful features from protein sequence data. The ...

متن کامل

Prediction of Protein Subcellular Multi-locations with a Min-Max Modular Support Vector Machine

How to predict subcellular multi-locations of proteins with machine learning techniques is a challenging problem in computational biology community. Regarding the protein multi-location problem as a multi-label pattern classification problem, we propose a new predicting method for dealing with the protein subcellular localization problem in this paper. Two key points of the proposed method are ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Genetics and molecular research : GMR

دوره 15 3  شماره 

صفحات  -

تاریخ انتشار 2016